An Empirical Evaluation of LFG-DOP
نویسنده
چکیده
This paper presents an empirical assessment of the LFGDOP model introduced by Bed & Kaplan (1998). The parser we describe uses fragments l'rom LFG-aunotated sentences to parse new sentences and Monte Carlo techniques to compute the most probable parse. While our main goal is to test Bed & Kaplan's model, we will also test a version of LFG-DOP which treats generalized fragments as previously unseen events. Experiments with the Verbmobil and Itomecentre corpora show that our version of LFG-DOP outperforms Bed & Kaplan's model, and that LFG's functional information improves the parse accuracy of Iree structures. 1 In t roduct ion We present an empirical ewduation of the LFG-DOP model introduced by Bed & Kaplan (1998). LFG-DOP is a Data-Oriented Parsing (DOP) model (Bed 1993, 98) based on the syntactic representat ions of LexicalFunctional Grammar (Kaplan & Bresnan 1982). A DOP model provides linguistic representations lotan tmlimitcd set of sentences by generalizing from a given corptts of annotated exemphu's, it operates by decomposing the given representations into (arbitrarily large) fi'agments and recomposing those pieces to analyze new sentences. The occurrence-frequencies of the fragments are used to determine the most probable analysis of a sentence. So far, DOP models have been implelnented for phrase-structure trees and logical-semantic representations (cf. Bed 1993, 98; Sima'an 1995, 99; Bonnema el al. 1997; Goodman 1998). However, these DOP models are limited in that they cannot accotmt for underlying syntactic and semantic dependencies that are not reflected directly in a surface tree. DOP models for a number of richer representations have been explored (van den Berg et al. 1994; Tugwell 1995), but these approaches have remained contex t f ree in their generat ive power. In contrast , Lexica l -Funct iona l Grammar (Kaplan & Bresnan 1982) is known to be beyond context-free. In Bed & Kaplan (1998), a first DOP model was proposed based on representations defined by LFG theory ("LFG-DOP"). I This model was I DOP models have recently also been proposed for TreeAdjoining Grammar and Head-driven Phrase Structure Grammar (cf. Neumann & Flickinger 1999). studied fi'om a mathematical perspective by Cormons (1999) who also accomplished a first simple experinacnt with LFG-DOP. Next, Way (1999) studied LFG-DOP as an architecture for machine translation. The current paper contains tile first extensive empMeal evaluation of LFG-DOP on the currently available LFG-annotatcd corpora: the Verbmobil corpus and the I tomecentre corpus. Both corpora were annotated at Xerox PARC. Out" parser uses fragments from LFG-annotated sentences to parse new sentences, and Monte Carlo lechniques to compute the most probable parse. Although our main goal is to lest Bed & Kaplan's LFGl)OP model, we will also test a modified version o1' LFG-DOP which uses a different model for computing fragment probabilities. While Bed & Kaplan treat all fragments probabil is t ical ly equal regardless whether they contain generalized features, we will propose a more f ine-grained probabi l i ty model which treats fragments with general ized features as previously unseen events and assigns probabi l i t ies to these fi'agments by means of discotmting. The experiments indicate that our probability model outperforms Bed & Kaplan's probabi l i ty model on the Verbmobil and Homecentre corpora. The rest of this paper is organized as follows: we first summarize the LFG-DOP model and go into our proposed extension. Next, we explain the Monte Carlo parsing technique for estimating lhe most probable LFGparse o1' a sentence. In section 3, we test our parser on sentences from the LFG-annotated corpora. 2 Summa ry of L FG -D O P and an Extension In accordance with Bed (1998), a particular DOP model is described by specifying settings for the following four parameters: • a formal definition of a well-formed representation for tltlcl'allcc (lllalys~s, • a set of decomposition operations that divide a given utterance analysis into a set of.fragments, • a set of composition operations by which such fragments may be recombined to derive an analysis of a new utterance, and • a probabili O, model that indicates how the probability of a new utterance analysis is computed.
منابع مشابه
LFG-DOT: Combining Constraint-Based and Empirical Methodologies for Robust MT
The Data-Oriented Parsing Model (DOP, [1]; [2]) has been presented as a promising paradigm for NLP. It has also been used as a basis for Machine Translation (MT) — Data-Oriented TVanslation (DOT, [9]). Lexical Functional Grammar (LFG, [5]) has also been used for MT ([6]). LFG has recently been allied to DOP to produce a new LFG-DOP model ([3]) which improves the robustness of LFG. We summarize ...
متن کاملA Data-Oriented Parsing Model for Lexical-Functional Grammar
Data-Oriented Parsing (DOP) models of natural language propose that human language processing works with representations of concrete past language experiences rather than with abstract linguistic rules. These models operate by decomposing the given representations into fragments and recomposing those pieces to analyze new utterances. A probability model is used to select from all possible analy...
متن کاملAn Improved Parser for Data-Oriented Lexical-Functional Analysis
We present an LFG-DOP parser which uses fragments from LFG-annotated sentences to parse new sentences. Experiments with the Verbmobil and Homecentre corpora show that (1) Viterbi n best search performs about 100 times faster than Monte Carlo search while both achieve the same accuracy; (2) the DOP hypothesis which states that parse accuracy increases with increasing fragment size is confirmed f...
متن کاملA DOP model for Lexical-Functional Grammar
It is well-known that there exist many syntactic and semantic dependencies that are not reflected directly in a surface tree. All modern linguistic theories propose more articulated representations and mechanisms in order to characterize such linguistic phenomena. The Tree-DOP model is thus limited in that it cannot account for these phenomena. In this chapter, we show how a DOP model can be de...
متن کاملStructured Parameter Estimation for LFG-DOP using Backoff
Despite its state-of-the-art performance, the Data Oriented Parsing (DOP) model has been shown to suffer from biased parameter estimation, and the good performance seems more the result of ad hoc adjustments than correct probabilistic generalization over the data. In recent work, we developed a new estimation procedure, called Backoff Estimation, for DOP models that are based on Phrase-Structur...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2000